A Portable Method for Finding User Errors in the Usage of MPI Collective Operations

نویسندگان

  • Christopher Falzone
  • Anthony Chan
  • Ewing L. Lusk
  • William Gropp
چکیده

An MPI profiling library is a standard mechanism for intercepting MPI calls by applications. Profiling libraries are so named because they are commonly used to gather runtime information about performance characteristics. Here we present a profiling library whose purpose is to detect user errors in the use of MPI’s collective operations. While some errors can be detected locally (by a single process), other errors involving the consistency of arguments passed to MPI collective functions must be tested for in a collective fashion. While the idea of using such a profiling library does not originate here, we take the idea further than it has been taken before (we detect more errors, including those involving datatype inconsistencies) and present an opensource library that can be used with any MPI implementation. We describe the tests carried out, provide some details of the implementation, illustrate the usage of the library, and present performance tests.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Collective Error Detection for MPI Collective Operations

An MPI profiling library is a standard mechanism for intercepting MPI calls by applications. Profiling libraries are so named because they are commonly used to gather performance data on MPI programs. Here we present a profiling library whose purpose is to detect user errors in the use of MPI’s collective operations. While some errors can be detected locally (by a single process), other errors ...

متن کامل

Optimizing a Conjugate Gradient Solver with Non-Blocking Collective Operations

This paper presents a case study about the applicability and usage of non blocking collective operations. These operations provide the ability to overlap communication with computation and to avoid unnecessary synchronization. We introduce our NBC library, a portable low-overhead implementation of non blocking collectives on top of MPI-1. We demonstrate the easy usage of the NBC library with th...

متن کامل

Verifying Collective MPI Calls

The collective communication operations of MPI, and in general MPI operations with non-local semantics, require the processes participating in the calls to provide consistent parameters, eg. a unique root process, matching type signatures and amounts for data to be exchanged, or same operator. Exhaustive consistency checks are typically too expensive to perform under normal use of MPI and would...

متن کامل

A Case for Non-blocking Collective Operations

Non-blocking collective operations for MPI have been in discussion for a long time. We want to contribute to this discussion and to give a rationale for the usage these operations and assess their possible benefits. A LogGP model for the CPU overhead of collective algorithms and a benchmark to measures it are provided and show a large potential to overlap communication and computation. We show ...

متن کامل

MPI Runtime Error Detection with MUST: Advanced Error Reports

The Message Passing Interface (MPI) is a widely used paradigm for distributed memory programming. Its API is primarily designed for good performance and less for usability; it provides only very limited abstractions that help enforce its correct use. As a result, application developers need tools that aid in the detection and removal of MPI usage errors. Our runtime error detection tool MUST ad...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJHPCA

دوره 21  شماره 

صفحات  -

تاریخ انتشار 2007